Integrated circuit comprising a hardware calculator and corresponding calculation method

ABSTRACT

In an embodiment an integrated circuit includes a hardware calculator configured to calculate in parallel a first output component Yn−1 of a first rank n−1 and a second output component Yn of a second rank n which is higher than and consecutive to the first rank, according to the formula: Ym=Σk=0N−1bkxm−k, in a series of operations, wherein the hardware calculator includes a first calculation path dedicated to the first output component Yn−1, a second calculation path dedicated to the second output component Yn, wherein, for each operation, a first register is configured to contain a pair of first factors {xi, xi−1} corresponding to terms {bkxm−k}[k;k+1]m=n−1 of an operation in the first path, a second register is configured to contain a pair of second factors {bj, bj+1} corresponding to terms {bkxm−k}[k;k+1]m=n−1 of the operation in the first path, and a third register is configured to contain a pair of second factors {bj+2, bj+3} corresponding to terms {bkxm−k}[k+2;k+3]m=n−1 of the next operation in the first path.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of French Application No. 2111263,filed on Oct. 22, 2021, which application is hereby incorporated hereinby reference.

TECHNICAL FIELD

Embodiments and implementations relate to the hardware calculators,carried out in an integrated manner, in particular adapted to performparallel calculations of the output vector components of a convolutionproduct.

BACKGROUND

Typically, and in particular in finite impulse response “FIR” filteralgorithms, a convolution product between an input vector{x_(i)}_(0≤i≤M−1) and another input vector {b_(i)}_(0≤i≤N−1) results inan output vector {Y_(m)}_(0≤m≤N−1) whose each component is defined bythe formula Y_(m)=Σ_(k=0) ^(N−1)b_(k)x_(m−k).

Usually, the data of the input vector {x_(i)}_(0≤i≤M−1) can be called“samples” and the data of the input vector {b_(i)}_(0≤i≤N−1) can becalled “taps”.

SUMMARY

Hardware calculators can be specifically dedicated to the calculationsof the multiplication and accumulation type “MAC” of the componentsY_(m) of the output vector of such a convolution product. The use ofhardware calculators allows optimising the workload of a general-purposecalculation unit, typically a processor of a system-on-chip, and tooptimise the calculation time of the output vector, called inferencetime.

Optimised hardware calculators are capable, in one instruction, ofsimultaneously performing two multiplication-accumulation calculations,as well as a loading of the next two samples x_(i) and a loading of thenext two taps b_(i) for the next two multiplication-accumulationoperations.

Thus, the number of multiplications-accumulations to calculate allcomponents of the output vector is equal to the product of the number ofsamples x_(i) by the number of taps b_(i), for example N×M according tothe example of dimensions given above. And, the number of sample andtaps loadings is also equal to N×M.

The applications of convolution products executed by hardwarecalculators tend to increase the dimension N, which has a quadraticimpact on the inference time of said calculations.

Multiplying the number of MAC type calculators in parallel allowsoptimising the number of successive instructions, but requiresmultiplying the number of input data loading processes at eachinstruction. Loading samples and taps, from memories, can become thetime limiting factor to calculate the convolution product.

Embodiments provide optimization of multiplication-accumulationtechniques in order to reduce the time required for the completecomputation of the convolution product.

Various embodiments provide performing, for each operation, fourmultiplications-accumulations with only two loadings of input data.

According to one embodiment, there is proposed in this regard anintegrated circuit comprising a hardware calculator adapted to calculatein parallel a first output component Y_(n−1) of a first rank n−1 and asecond output component Y_(n) of a second rank n which is higher thanand consecutive to the first rank, according to the formula:Y_(m)=Σ_(k=0) ^(N−1)b_(k)x_(m−k), in a series of operations, thehardware calculator including a first calculation path dedicated to thefirst output component Y_(n−1), a second calculation path dedicated tothe second output component Y_(n), in which, for each operation, a firstregister is configured to contain a pair of first factors {x_(i),x_(i−1)} corresponding to the terms {b_(k)x_(m−k)}_([k;k+1]) ^(m=n−1) ofsaid operation in the first path, a second register is configured tocontain a pair of second factors {b_(j), b_(j+1)} corresponding to theterms {b_(k)x_(m−k)}_([k;k+1]) ^(m=n−1) of said operation in the firstpath, and a third register is configured to contain a pair of secondfactors {b_(j+2), b_(j+3)} corresponding to the terms{b_(k)x_(m−k)}_([k+2;k+3]) ^(m=n−1) of the next operation in the firstpath, the two calculation paths being configured to each access thefirst register, the second register and the third register, so as touse, in each operation, the first factors x_(m−k) and the second factorsb_(k) at the corresponding position of the summation index 0≤k≤N−1 insaid formula of rank m=n−1, m=n respective to each of the outputcomponents Y_(n−1), Y_(n).

Thus, for each operation, the first calculation path uses factorsprovided for this operation contained in the first register and in thesecond register, while the second calculation path uses the factorscorresponding to the second rank m=n et and to the correspondingposition of the summation index 0≤k≤N−1 common with the second factorsof said operation in progress in the first path and with the nextoperation in the first path.

According to one embodiment, for each operation:

the first calculation path is configured to calculate and accumulate, ina first output register, the pair of two products{b_(k)x_(m−k)}_([k;k+1]) ^(m=n−1) between the first factors {x_(i),x_(i−1)} contained in the first register and the second factors {b_(j),b_(j+1)} contained in the second register,the second calculation path is configured to calculate and accumulate,in a second output register, the pair of two products{b_(k)x_(m−k)}_([k;k+1]) ^(m=n) between the same first factors {x_(i),x_(i−1)} contained in the first register and the second factors{b_(j+1), b_(j+2)} corresponding to the calculation of the second rankn, contained in the second register and in the third register;the hardware calculator is configured to load, into the first register,a pair of first factors {x_(i−2), x_(i−3)} corresponding to thecalculation of the next operation of the first path, and to load, intothe second register, a pair of second factors {b_(j+4), b_(j+5)}corresponding to the calculation of the operation following the nextoperation of the first path.

Indeed, the configuration of the two calculation paths accessing thethree input registers, allows performing only two loadings to be carriedout for each operation to allow the four multiplication-accumulation ofthe following operation.

According to another embodiment, an integrated circuit is proposedcomprising a hardware calculator adapted to calculate in parallel afirst output component Y_(n−1) of a first rank n−1 and a second outputcomponent Y_(n) of a second rank n which is higher than and consecutiveto the first rank, according to the formula: Y_(m)=Σ_(k=0)^(N−1)b_(k)x_(m−k), in a series of operations the hardware calculatorincluding a first calculation path dedicated to the first outputcomponent Y_(n−1), a second calculation path dedicated to the secondoutput component Y_(n), a first register intended to contain a pair offirst factors {x_(i), x_(i−1)}, a second register intended to contain apair of second factors {b_(j), b_(j+1)} and a third register intended tocontain a pair of second factors {b_(j+2), b_(j+3)}, wherein, for eachoperation, the hardware calculator is configured to:

with the first path, calculate and accumulate, in a first outputregister, a pair of two products {b_(k)x_(m−k)}_([k;k+1]) ^(m=n−1)between the first factors {x_(i), x_(i−1)} contained in the firstregister and the second factors {b_(j), b_(j+1)} contained in the secondregister;with the second path, calculate and accumulate, in a second outputregister, a pair of two products {b_(k)x_(m−k)}_([k;k+1]) ^(m=n) betweenthe same first factors {x_(i), x_(i−1)} contained in the first registerand the second factors {b_(j+1), b_(j+2)} corresponding to thecalculation of the second rank n, contained in the second register andin the third register;load, into the first register, a pair of first factors {x_(i−2),x_(i−3)} corresponding to the calculation of the next operation of thefirst path,load, into the second register, a pair of second factors {b_(j+4),b_(j+5)} corresponding to the calculation of the operation following thenext operation of the first path.

In other words, the two calculation paths are configured to each accessthe first register, the second register and the third register, so as touse in each operation the first factors x_(m−k) and the second factorsb_(k) common for the ranks m=n−1; m=n of each of the output componentsY_(n−1), Y_(n), and at the corresponding position of the summation index0≤k≤N−1 in said respective formula at the rank of each of the outputcomponents Y_(n−1), Y_(n).

As a consequence of this configuration of the twomultiplication-accumulation calculation paths being able to access thethree registers, the hardware calculator is capable of calculating fourmultiplication-accumulation terms for two loadings of factors in saidregisters, for each operation.

Indeed, the loading of the second factors {b_(j+4), b_(j+5)}corresponding to the calculation of the operation following the nextoperation of the first path; allows, during the next operation:

having available the second factors {b_(j+4), b_(j+5)} (loaded into theregister called “second register” during the previous operation) whichwill be used in the following operation by the first path, but which areused, at least partially (at least one), in this operation by the secondpath; andhaving available the second factors {b_(j+2), b_(j+3)} (loaded into theregister called “third register” during the operation prior to theprevious operation), used in this operation by the first path and used,at least partially (at least one), in this operation by the second path.

In fact, the qualifications “second” and “third” registers arearbitrary, that is to say that the integrated circuit materiallycomprises these two registers which can alternately have the functiondefined above of the second register and the function defined above ofthe third register. For example if, during an operation, (resp.) one ofthe registers fulfils the function of the “second” register then (resp.)the other fulfils the function of the “third” register, while duringanother operation (next operation), said (resp.) one will fulfil thefunction of the “third” register and (resp.) the other will fulfil thefunction of the “second” register.

For example, in this regard, the hardware calculator is configured, inthe series of operations, to switch the functions of the second registerand of the third register, for each successive operation.

According to one embodiment, the first register, the second register,the third register as well as the first output register and the secondoutput register have a size of 2M bits, for example 64 bits (M=32), andeach contain a pair of two data encoded on M bits, for example 32 bits(M=32).

According to one embodiment, the hardware calculator includes aselection circuit configured to distribute the accesses to the secondregister and to the third register at the first calculation path and thesecond calculation path, such that the first path has access to thesecond factors {b_(j), b_(j+1)} corresponding to said operationcontained in the second register, and that the second path has access tothe second factors {b_(j+2), b_(j+3)} corresponding to said operationcontained in the second register and in the third register.

In other words, unlike conventional hardware calculators, the inputregisters are not dedicated to a single calculation path, but aremutualised and distributed between the different calculation paths viathe selection circuit.

According to one embodiment, the hardware calculator is further adaptedto calculate in parallel a third output component Y_(n+1), of a thirdrank n+1 which is higher than and consecutive to the second rank n,according to the same formula, the hardware calculator including a thirdcalculation path dedicated to the third output component Y_(n+1),

and in which, for each operation, the hardware calculator is furtherconfigured to:with the third path, calculate and accumulate in a third output registera pair of two products {b_(k)x_(m−k)}_([k;k+1]) ^(m=n+1) between thesame first factors {x_(i), x_(i−1)} contained in the first register andthe second factors {b_(j+2), b_(j+3)} corresponding to the calculationof the third rank n+1, contained in the third register.

Thus, with a third calculation path configured for itself also to accessthe first register, the second register and the third register, it ispossible to use for each operation the first factors x_(m−k) and thesecond factors b_(k), further common for the rank m=n+1 and at thecorresponding position of the summation index 0≤k≤N−1 in said formula.

As a consequence of this configuration of the threemultiplication-accumulation calculation paths being able to access thethree registers, the hardware calculator is capable of calculating sixmultiplication-accumulation terms for two loadings of factors into saidregisters, for each operation.

Indeed, the loading of the second factors {b_(j+4), b_(j+5)}corresponding to the calculation of the operation following the nextoperation of the first path; further allows, during the next operation:having available the second factors {b_(j+4), b_(j+5)} (loaded in theregister called “second register” during the previous operation) whichwill be used in the next operation by the first path, but which areused, at least partially, in this operation by the second path, andwhich are further used in this operation through the third path.

According to one embodiment, the integrated circuit defined aboveincorporates a digital signal processor.

According to another aspect, there is proposed a method, implementedwithin a hardware calculator, of parallel calculation of a first outputcomponent Y_(n−1) of a first rank n−1 and of a second output componentY_(n) of a second rank n which is higher than and consecutive to thefirst rank, according to the formula: Y_(m)=Σ_(k=0) ^(N−1)b_(k)x_(m−k),in a series of operations, in which, for each operation:

the calculation of the first component Y_(n−1) comprises a calculationand an accumulation of two products {b_(k)x_(m−k)}_([k;k+1]) ^(m=n−1)between first factors {x_(i), x_(i−1)} contained in a first register andsecond factors {b_(j), b_(j+1)} contained in a second register;the calculation of the second component Y_(n) comprises a calculationand an accumulation of two products {b_(k)x_(m−k)}_([k;k+1]) ^(m=n)between the same first factors {x₁, x_(i−1)} contained in the firstregister and the second factors {b_(j+1), b_(j+2)} corresponding to thecalculation of the second rank n, contained in the second register andin a third register;a loading, into the first register, of a pair of first factors {x_(i−2),x_(i−3)} corresponding to the calculation of the next operation,a loading, into the second register, of a pair of second factors{b_(j+4), b_(j+5)} corresponding to the calculation of the operationfollowing the next operation of the first path.

According to one embodiment, the method comprises, in the series ofoperations, a switching of the functions of the second register and ofthe third register, for each successive operation.

According to one embodiment, the first register, the second register,the third register as well as the first output register and the secondoutput register have a size of 2M bits and each contain a pair of twodata items encoded on M bits.

According to one embodiment, the accesses to the second register and tothe third register are distributed such that a first calculation pathdedicated to the first output component Y_(n−1), has access to thesecond factors {b_(j), b_(j+1)} corresponding to said operationcontained in the second register, and that a second calculation pathdedicated to the second output component Y_(n), have access to thesecond factors {b_(j+2), b_(j+3)} corresponding to said operationcontained in the second register and in the third register.

According to one embodiment, the method further comprises a parallelcalculation of a third output component Y_(n+1), of a third rank n+1which is higher than and consecutive to the second rank n, according tothe same formula, in which, for each operation:

the calculation of the third component Y_(n+1) comprises a calculationand an accumulation of two products {b_(k)x_(m−k)}_([k;k+1]) ^(m=n+1)between the same first factors {x_(i), x_(i−1)} contained in the firstregister and the second factors {b_(j+2), b_(j+3)} corresponding to thecalculation of the third rank n+1, contained in the third register.

BRIEF DESCRIPTION OF THE DRAWINGS

Other advantages and features of the invention will become apparent onexamining the detailed description of embodiments and implementations,with no limitation, and of the appended drawings, in which:

FIG. 1 illustrates a hardware calculator according to embodiments;

FIG. 2 illustrates an implementation principle according to embodiments;

FIG. 3 illustrates a hardware calculator with respect to a firstoperation according to embodiments;

FIG. 4 illustrates an implementation principle of a second operationaccording to embodiments;

FIG. 5 illustrates a hardware calculator with respect to a secondoperation after the first operation according to embodiments;

FIG. 6 illustrates a further implementation principle according toembodiments; and

FIG. 7 illustrates an exemplary embodiment of the hardware calculator.

DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS

FIG. 1 illustrates an example of a hardware calculator CAL in particularadapted to calculate convolution products. The hardware calculator canbe produced within an integrated circuit of a digital signal processorDSP.

The components of an output vector of a convolution product can beexpressed by the general formula Y_(m)=Σ_(k=0) ^(N−1)b_(k)x_(m−k), whichis a sum where each term is the product of factors. The factors areinput data for the convolution. The first factors x_(m−k) are forexample components of an input vector, otherwise called “samples”, thesecond factors b_(k) are for example coefficients, otherwise called“taps”.

The hardware calculator CAL comprises two parallel calculation pathsVC1, VC2, each configured to perform multiplication-accumulationoperations MACg, MACd, in a series of successive operations, such thatthe terms accumulated in the output registers REGout composeprogressively an output component Y_(m) at the end of the series. Eachoperation comprises a multiplication of the data x_(m−k), b_(k)contained in respective input registers REGin, and an accumulation ofthe products in an output register REGout. Furthermore, in an operation,a loading of the input data of a next operation can be done in the inputregisters REGin.

The calculation paths VC1, VC2 advantageously include a firstoptimisation of the use of the input REGin and output REGout registers,allowing limiting the number of loading of the input data in the seriesof operations. Indeed, the registers REGin and REGout have a size of 2Mbits, for example 64 bits (M=32) and each contain two distinct dataencoded on M bits, for example 32 bits. Arbitrarily, in each 64-bitregister, a distinction is thus made between the 32-bit word stored onthe left in the drawings by the reference “g”, and the 32-bit wordstored on the right in the drawings by the reference “d”.

Thus, in each calculation path VC1, VC2, a single loading of a word of2M bits into an input register REGin allows loading both a first inputfactor for a first multiplication-accumulation MACg performed in theleft portion “g” of the input-output registers REGin, REGout, andanother first input factor for a second multiplication-accumulation MACdperformed in parallel in the right portion “d” of the input-outputregisters REGin, REGout. For example, the calculations in the left “g”and right “d” portions of the registers REGin, REGout can beimplemented, such that the terms of the sum Σ_(k=0) ^(N−1)b_(k)x_(m−k)having an even index k are accumulated in the right portion “d” of theoutput register REGout, while the terms having an odd index k areaccumulated in the left portion “g” of the output register REGout.

Thus, independently in each of the paths VC1, VC2 and at each operation,two 64-bit loadings are sufficient to carry out twomultiplications-accumulations MACg, MACd.

Furthermore, the two calculation paths VC1, VC2 can advantageously shareone of the two input registers REGin, for example the input register R3storing input data x_(n−1), x_(n−2), and perform themultiplication-accumulation operations with the coefficients b_(k)having the indices k corresponding to these input data in the sumΣ_(k=0) ^(N−1)b_(k)x_(m−k) relative to the rank m of the outputcomponent Y_(m)=Σ_(k=0) ^(N−1)b_(k)x_(m−k), respective to eachcalculation path VC1, VC2.

For example, the first path VC1 calculates the output component Y_(n−1)of a first rank n−1 and the second path calculates the second outputcomponent Y_(n) of a second rank n. Then, in the first path VC1 at therank m=n−1, the coefficients b_(k) are loaded, having the indices k, k+1corresponding to the products {b_(k)x_(m−k)}_(k) ^(m=n−1) for the inputdata x_(n−1), x_(n−2), (i.e. (n−1)−k=n−1; (n−1)−(k+1)=n−2 i.e. k=0,(k+1)=1), that is to say the coefficients b₀ and b₁, in the second inputregister R7. Similarly, in the second path VC2 at rank m=n, thecoefficients b_(k) are loaded, having the indices k, k+1 correspondingto the products {b_(k)x_(m−k)}_(k) ^(m=n) for the input data x_(n−1),x_(n−2), (i.e. n−k=n−1; n−(k+1)=n−2 i.e. k=1, (k+1)=2), that is to saythe coefficients b₁ and b₂, in the third input register R6.

This allows mutualising, in the set of the two paths VC1, VC2 and foreach operation, a 64-bit loading in the common register R3, such thatthree 64-bit loadings are sufficient to carry out fourmultiplications-accumulations for each operation in the set of the twopaths VC1, VC2.

Reference is now made to FIGS. 2 to 5 illustrating an advantageousexample allowing reducing to two the number of loadings in the inputregisters REGin (FIG. 1 ) in order to carry out the fourmultiplications-accumulations at each operation in the set of the twopaths VC1, VC2.

FIG. 2 illustrates the implementation principle allowing mutualising anadditional loading in the input registers R3, R7, R6, for the firstcalculation path VC1 and the second calculation path VC2 as describedabove in relation with FIG. 1 .

On the one hand, the first rank n−1 of the first output componentY_(n−1) calculated by the first path VC1, and the second rank n of thesecond output component Y_(n) calculated by the second path VC2, areadvantageously consecutive with each other. The case where the secondrank n is higher than and directly consecutive to the first rank n−1.This corresponds to the case described in relation to FIG. 1 .

In FIGS. 2, 4 and 6 , the representation of the equations Y_(n−1), Y_(n)are the expansion of the formula Y_(m)=Σ_(k=0) ^(N−1)b_(k)x_(m−k) forthe respective ranks n−1, n in which the letter filling patternscorrespond to the pattern of the input register R3, R7, R6 whichcontains the respective factors, the “empty” or “white” pattern meaningthat these factors are not loaded into a register during the operationOP1.

During a first operation OPi, performed with the input data x_(n−1),x_(n−2), the coefficients b₀ and b₁ used by the first VC1 are containedin the second register R7. For the same input data x_(n−1), x_(n−2), thecoefficient b₁ used by the second path VC2 is contained in the secondregister R7, the coefficient b₂ used by the second path VC2 is containedin the third register R6.

Reference is made to FIG. 3 , illustrating the hardware calculator CALin the context of the implementation of the first operation OP1.

In the first path VC1, the multiplication-accumulation operation isimplemented as described in relation to FIG. 1 , that is to say that thetwo products {b_(k)x_(m−k)}_([k;k+1]) ^(m=n−1) between the first factors{x_(n−1), x_(n−2)} contained in the first register R3 and the secondfactors {b₀, b₁} contained in the second register R7 are accumulated inthe output register Ro of the first path VC1.

In details, the product between the first factor x_(n−1) and the secondfactor b₀ is accumulated in the right portion of the output registerRod, from the data contained in the right portion of the first registerR3 d and in the right portion of the second register R7 d. Thismultiplication-accumulation operation “Rod+=R3 d*R7 d” (“+=” meaning“accumulation”, “*” meaning “multiplication”). Likewise, the product ofthe other first factor x_(n−2) and the other second factor b₁ isaccumulated in the left portion of the output register Rog, such as“Rog+=R3 g*R7 g”.

In the second path VC2, the multiplication-accumulation operation isimplemented such that the two products {b_(k)x_(m−k)}_([k;k+1]) ^(m=n)between the first factors {x_(n−1), x_(n−2)} contained in the firstregister R3 and the second factors {b₁, b₂} contained in the secondregister R7 and in the third register R6 are accumulated in the outputregister R4 of the second path VC2.

In details, in the second path VC2, the product between the first factorx_(n−1) and the second factor b₁ is accumulated in the right portion ofthe output register R4 d, from the data contained in the right portionof the first register R3 d and in the left portion of the secondregister R7 d, such as “R4 d+=R3 d*R7 g”. And, the product between theother first factor x_(n−2) and the other second factor b₂ is accumulatedin the left portion of the output register R4 g, from the data containedin the left portion of the first register R3 g and in the right portionof the third register R6 d, such as “R4 g+=R3 g*R7 d”.

In this regard, the hardware calculator CAL may include a selectioncircuit SWT configured to distribute the accesses to the second registerR7 and to the third register R6, at the first calculation path VC1 andat the second calculation path VC2. The selection circuit SWT isconfigured to distinctly distribute the left and right portions of saidregisters R7, R6.

In the first operation OP1, the distribution is made such that the leftportion MACg of the first path VC1 receives as input the left portion ofthe first register R3 g and the left portion of the second register R7g; the right portion MACd of the first path VC1 receives as input theright portion of the first register R3 d and the right portion of thesecond register R7 d. While the left portion MACg of the second path VC2receives as input the left portion of the first register R3 g and theright portion of the third register R6 d; and the right portion MACd ofthe second path VC2 receives as input the right portion of the firstregister R3 d and the left portion of the second register R7 g.

Thus the first path VCi has access to the second factors {b_(j),b_(j+1)} of the corresponding operation contained in the second registerR7, and the second path VC2 has access to the second factors {b_(j+2),b_(j+3)} of the corresponding operation contained in the second registerR7 g and in the third register R6 d.

Simultaneously with the parallel multiplication-accumulationcalculations in the two calculation paths VC1, VC2, the hardwarecalculator CAL is configured to perform, in the first register R3, aloading LD(x_(n=n−2)) of the pair of first factors {x_(n−3), x_(n−4)}corresponding to the calculation of the next operation OP2 (FIGS. 4 and5 ) of the first path VC1 and of the second path VC2; as well as, in thesecond register R7, only one other loading LD(b_(k=k+4)) of the pair ofsecond factors {b₄, b₅} corresponding to the calculation of theoperation following the next operation OP2 of the first path VC1.

Consequently, during the next operation OP2, the second register R7 andthe third register R6 will contain second factors {b₂, b₃} {b₄, b₅} in amanner comparable to their content at the beginning of the firstoperation OP1.

Reference is made in this regard to FIG. 4 .

FIG. 4 illustrates the principle of implementation of the calculationsin the first path VC1 and in the second path VC2, during the nextoperation after the first operation OP1, that is to say during thesecond operation OP2.

It is recalled that at the end of the loadings performed during thefirst operation OP1, the first input register R3 contains the input datax_(n−3), x_(n−4), the second input register R7 contains the coefficientsb₄ and b₅ corresponding to the products {b_(k)x_(m−k)}_(k) ^(m=n−1) forthe input data x_(n−5), x_(n−6) of the operation following the secondoperation OP2, in the first path VC1. The third register R6 has not beenloaded with new data, and therefore contains the coefficients b₂ and b₃,corresponding to the products {b_(k)x_(m−k)}_(k) ^(m=n−1) for the inputdata x_(n−3), x_(n−4) of the operation following the first operationOP1, in the first path VC1, that is to say the second operation OP2 inprogress in the first path VC1.

During the second operation OP2, with the input data x_(n−3), x_(n−4),the coefficients b₂ and b₃ or the first VC1 are contained in the thirdregister R6. For the same input data x_(n−3), x_(n−4), the coefficientb₃ used by the second path VC2 is contained in the third register R6,and the coefficient b₄ used by the second path VC2 is contained in thesecond register R7.

Reference is made to FIG. 5 , illustrating the hardware calculator CALin the context of the implementation of the second operation OP2,following (after) the first operation OP1.

In the first path VC1, the multiplication-accumulation operation isimplemented such that the two products {b_(k)x_(m−k)}_([k;k+1]) ^(m=n−1)between the first factors {_(n−3), x_(n−4)} contained in the firstregister R3 and the second factors {b₂, b₃} contained in the thirdregister R6 are accumulated in the output register Ro of the first pathVC1.

In detail, the multiplication-accumulation in the right portion of theoutput register Rod of the first path VC1 is expressed “Rod+=R3 d*R6 d”,and the multiplication-accumulation in the left portion of the outputregister Rog is expressed “Rog+=R3 g*R6 g”.

In the second path VC2, the multiplication-accumulation operation isimplemented such that the two products {b_(k)x_(m−k)}_([k;k+1]) ^(m=n)between the first factors {x_(n−3), x_(n−4)} contained in the firstregister R3 and the second factors {b₃, b₄} contained in the secondregister R7 and in the third register R6 are accumulated in the outputregister R4 of the second path VC2.

In detail, the multiplication-accumulation in the right portion of theoutput register R4 d of the second path VC2 is expressed “R4 d+=R3 d*R6g”, and the multiplication-accumulation in the left portion of theoutput register R4 g is expressed “R4 g+=R3 g*R7 d”.

Herein again, the selection circuit SWT is configured to make thedistribution such that the left portion MACg of the first path VC1receives as input the left portion of the first register R3 g and theleft portion of the third register R6 g; the right portion MACd of thefirst path VC1 receives as input the right portion of the first registerR3 d and the right portion of the third register R6 d. While the leftportion MACg of the second path VC2 receives as input the left portionof the first register R3 g and the right portion of the second registerR7 d; and the right portion MACd of the second path VC2 receives asinput the right portion of the first register R3 d and the left portionof the third register R6 g.

And herein again, simultaneously with the parallelmultiplication-accumulation calculations in the two calculation pathsVC1, VC2, the hardware calculator CAL is configured to perform, in thefirst register R3, a loading LD(x_(n=n−2)) of the pair of first factors{x_(n−5), x_(n−6)} corresponding to the calculation of the nextoperation (i.e. the operation after the second operation OP2) of thefirst path VC1 and of the second path VC2; as well as, in the thirdregister R6, only one other loading LD (b_(k=k+4)) of the pair of secondfactors {b₆, b₇} corresponding to the calculation of the operationfollowing the next operation (i.e. following the operation after thesecond operation OP2) of the first path VC1.

Consequently, during the next operation (after the second operationOP2), the second register R7 and the third register R6 will containsecond factors {b₄, b₅} {b₆, b₇} in a manner comparable to their contentat the start of the first operation OP1 and at the start of the secondoperation OP2.

The implementation of the operation following the second operation OP2is strictly identical to the first operation OP1, from the point of viewof the accesses and the loadings in the input registers R3, R6, R7, butthis time loaded with the data corresponding to the respectiveadvancement of the index k in the sum Σ_(k=0) ^(N−1)b_(k)x_(m−k) of rankm=n−1.

It will be noted that, between the first operation OP1 and the secondoperation OP2, the difference between all actions performed in thesecond register R7 and the third register R6 corresponds to a switching,that is to say a strict exchange, between the actions performed in thesecond register R7 and actions performed in the third register R6.

Consequently, in the series of operations, the hardware calculator CALis configured to switch the functions of the second register R7/R6 andthe third register R6/R7, at each successive operation, one after theother. In other words, for each new operation, the third registerbecomes the second register and the second register becomes the thirdregister, and the same actions are performed in the “new” secondregister and in the “new” third register.

From another point of view, it can be considered that the hardwarecalculator CAL always performs strictly the same actions in “one” secondregister and in “one” third register for each operation. Indeed, in thefirst operation OP1 (and the “odd” operations), the second register isthe register having the reference R7 and the third register is theregister having the reference R6; while in the second operation OP2 (andthe “even” operations), the exact same actions are executed in a secondregister and in a third register, the second register being the registerhaving the reference R6 and the third register being the register havingthe reference R7.

In summary, a technique of hardware calculation ofmultiplication-accumulation operations has been described, in which, foreach operation progressively composing the first output component oflower rank n−1, the first input factors {x_(i), x_(i−1)} are loaded intoa first register R3 and the second factors {b_(j), b_(j+1)} are loadedinto a second register R7/R6. A third register R6/R7 is further providedto contain the next second factors {b_(j+2), b_(j+3)} for the nextoperation of the first output component of lower rank n−1. At the sametime, the multiplication-accumulation operations for the second outputcomponent of higher consecutive rank n, between the same first factors{x_(i), x_(i−1)} loaded in the first register R3, and the second factors{b_(j+1), b_(j+2)} corresponding to these first factors, distributed inthe second register R7/R6 and in the third register R6/R7.

Consequently, the hardware calculation technique thus summarised allowscarrying out the four multiplications-accumulations for each operationin the set of the two paths VC1, VC2, with only two loadings in theinput registers R3, R7/R6 (REGin—FIG. 1 ).

It will be noted that an initialisation phase of the series, before thevery first operation of the series, comprises three loadings of theinput registers R3, R7, R6 with the corresponding data, for example asrepresented in FIG. 2 . Then, throughout the implementation of theseries of operations, each operation includes only two loadings.

On the other hand, in the developed expressions of the equationsY_(n−1), Y_(n) of the formula Y_(m)=Σ_(k=0) ^(N−1)b_(k)x_(m−k)illustrated in FIGS. 2, 4 and 6 , the indices k of the coefficientsb_(k) all range from o to N−1, while the indices m−k of the input datax_(m−k) are noted so as to go from the corresponding rank m to the rankm−(N−1). This being the case, it is understood that the index values,such as possibly the value denoted m−(N−1), are considered modulo (N−1)such that when m−k<0 the effective indices are equal to (N−1)+(m−k),given that the input vector {x_(i)}_(i) typically does not have an indexi with a negative sign. In this regard, the values of the input data{x_(i)}_(i) can typically be stored in a circular register, implementingby construction the modulo function.

FIG. 6 illustrates a complementary example corresponding to a case wherethe hardware calculator further includes a third calculation path (notrepresented), which is similar to the first calculation path VC1 and tothe second calculation path VC2, adapted to calculate in parallel athird output component Y_(n+1), of a third rank n+1 which is higher thanand consecutive to the second rank n, and according to the same formulaY_(m)=Σ_(k=0) ^(N−1)b_(k)x_(m−k).

For each operation, the hardware calculator is configured to calculateand accumulate in a third output register (not represented) a pair oftwo products {b_(k)x_(m−k)}_([k;k+1]) ^(m=n+1) between the same firstfactors {x_(n−1), x_(n−2)} contained in the first register R3 and thesecond factors {b_(j+2), b_(j+3)} contained in the third register R6.

In a first operation OP11, the multiplication-accumulation operationsperformed for the first output component Y_(n−1) by the first path VC1,and for the second output component Y_(n) by the second path VC2,correspond precisely to the first operation OP1 which is previouslydescribed in relation to FIGS. 2 and 3 .

Thus, in the first operation OP1 which is previously described inrelation to FIGS. 2 and 3 , the third register R6 contains thecoefficients {b₂, b₃}.

However, for the terms of the third output component Y_(n+1) p by thethird path, the second factors of indices k, k+1 corresponding to theproducts {b_(k)x_(m−k)}_(k) ^(m=n+1) with input data x_(n−1), x_(n−2)for the rank n+1 (i.e. (n+1)−k=n−1; (n+1)−(k+1)=n−2 i.e. k=2; k+1=3),are the coefficients {b₂, b₃} contained in the third register R6.

The same loadings as those described in relation to FIGS. 2 and 3 aremade in the registers R3 and R7 during the first operation OP11.

Then, in a second operation following the first operation OP11, thefirst register R3 contains the first factors {x_(n−3), x_(n−4)}, thesecond register R7 contains the second factors {b₄, b₅} and the thirdregister R6 contains the second factors {b₂, b₃}.

Thus, herein again the multiplication-accumulation operations performedfor the first output component Y_(n−1) by the first path VC1, and forthe second output component Y_(n) p by the second path VC2, correspondprecisely to the second operation OP2 which is previously described inrelation to FIGS. 4 and 5 .

And, for the terms of the third output component Y_(n+1) by the thirdpath, the second factors of indices k, k+1 corresponding to the products{b_(k)x_(m−k)}_(k) ^(m=n+1) with input data x_(n−3), x_(n−4) for therank n+1 (i.e. (n+1)−k=n−3; (n+1)−(k+1)=n−4 i.e. k=4; k+1=5), are thecoefficients {b₄, b₅} contained in the second register R7.

Consequently, by means of an additional third computation path, thehardware calculator is capable of calculating sixmultiplication-accumulation terms for two factor loadings into saidregisters at each operation.

FIG. 7 illustrates an exemplary embodiment of the hardware calculatorCAL described previously in relation to FIGS. 2 to 6 . The hardwarecalculator CAL belongs for example to a digital signal processor DSP,produced in an integrated manner within an integrated circuit, such as amicrocontroller MCU.

While this invention has been described with reference to illustrativeembodiments, this description is not intended to be construed in alimiting sense. Various modifications and combinations of theillustrative embodiments, as well as other embodiments of the invention,will be apparent to persons skilled in the art upon reference to thedescription. It is therefore intended that the appended claims encompassany such modifications or embodiments.

What is claimed is:
 1. An integrated circuit comprising: a hardwarecalculator configured to calculate in parallel a first output componentY_(n−1) of a first rank n−1 and a second output component Y_(n) of asecond rank n which is higher than and consecutive to the first rank,according to the formula: Y_(m)=Σ_(k=0) ^(N−1)b_(k)x_(m−k), in a seriesof operations, wherein the hardware calculator includes a firstcalculation path dedicated to the first output component Y_(n−1), asecond calculation path dedicated to the second output component Y_(n),wherein, for each operation, a first register is configured to contain apair of first factors {x_(i), x_(i−1)} corresponding to terms{b_(k)x_(m−k)}_([k;k+1]) ^(m=n−1) of an operation in the first path, asecond register is configured to contain a pair of second factors{b_(j), b_(j+1)} corresponding to terms {b_(k)x_(m−k)}_([k;k+1])^(m=n−1)of the operation in the first path, and a third register isconfigured to contain a pair of second factors {b_(j+2), b_(j+3)}corresponding to terms {b_(k)x_(m−k)}_([k;k+3]) ^(m=n−1) of the nextoperation in the first path, and wherein the two calculation paths areconfigured to each access the first register, the second register andthe third register, so as to use, in each operation, the first factorsx_(m−k) and the second factors b_(k) at the corresponding position ofthe summation index 0≤k≤N−1 in the formula of rank m=n−1, m=n respectiveto each of the output components Y_(n−1), Y_(n).
 2. The integratedcircuit according to claim 1, wherein, for each operation, the firstcalculation path is configured to calculate and accumulate, in a firstoutput register, the pair of two products {b_(k)x_(m−k)}_([k;k+1])^(m=n−1) between the first factors {x_(i), x_(i−1)} contained in thefirst register and the second factors {b_(j), b_(j+1)} contained in thesecond register, wherein, for each operation, the second calculationpath is configured to calculate and accumulate, in a second outputregister, the pair of two products {b_(k)x_(m−k)}_([k;k+1]) ^(m=n)between the same first factors {x_(i), x_(i−)} contained in the firstregister and the second factors {b_(j+1), b_(j+2)} corresponding to thecalculation of the second rank n, contained in the second register andin the third register, and wherein, for each operation, the hardwarecalculator is configured to load, into the first register, a pair offirst factors {x_(i−2), x_(i−3)} corresponding to the calculation of thenext operation of the first path, and to load, into the second register,a pair of second factors {b_(j+4), b_(j+5)} corresponding to thecalculation of the operation following the next operation of the firstpath.
 3. The integrated circuit according to claim 1, wherein thehardware calculator is configured, in the series of operations, toswitch the functions of the second register and of the third register,for each successive operation.
 4. The integrated circuit according toclaim 1, wherein the first register, the second register, the thirdregister the first output register and the second output register have asize of 2M bits and each contains a pair of two data items encoded on Mbits.
 5. The integrated circuit according to claim 1, wherein thehardware calculator includes a selection circuit configured todistribute accesses to the second register and to the third register atthe first calculation path and the second calculation path, such thatthe first path has access to the second factors {b_(j), b_(j+1)}corresponding to the operation contained in the second register, andthat the second path has access to the second factors {b_(j+2), b_(j+3)}corresponding to the operation contained in the second register and inthe third register .
 6. The integrated circuit according to claim 1,wherein the hardware calculator is configured to calculate in parallel athird output component Y_(n+1), of a third rank n+1 which is higher thanand consecutive to the second rank n, according to the same formula, thehardware calculator including a third calculation path dedicated to thethird output component Y_(n+1), and wherein, for each operation, thehardware calculator is further configured to calculate and accumulate,with the third path, in a third output register a pair of two products{b_(k)x_(m−k)}_([k;k+1]) ^(m=n+1) between the same first factors {x_(i),x_(i−1)} contained in the first register and the second factors{b_(j+2), b_(j+3)} corresponding to the calculation of the third rankn+1, contained in the third register.
 7. The integrated circuitaccording to claim 1, wherein the integrated circuit is a digital signalprocessor.
 8. The integrated circuit comprising: a hardware calculatorconfigured to calculate in parallel a first output component Y_(n−1) ofa first rank n−1 and a second output component Y_(n) of a second rank nwhich is higher than and consecutive to the first rank, according to theformula: Y_(m)=Σ_(k=0) ^(N−1)b_(k)x_(m−-k), in a series of operations,wherein the hardware calculator includes a first calculation pathdedicated to the first output component Y_(n−1), a second calculationpath dedicated to the second output component Y_(n), a first registerconfigured to contain a pair of first factors {x_(i), x_(i−1)}, a secondregister configured to contain a pair of second factors {b_(j), b_(j+1)}and a third register configured to contain a pair of second factors{b_(j+2), b_(j+3)}, wherein the hardware calculator is, for eachoperation, configured to: calculate and accumulate, with the first path,in a first output register, a pair of two products{b_(k)x_(m−k)}_([k;k+1]) ^(m=n−1) between the first factors {x_(i),x_(i−1)} contained in the first register and the second factors {b_(j),b_(j+1)} contained in the second register; calculate and accumulate,with the second path, in a second output register, a pair of twoproducts {b_(k)x_(m−k)}_([k;k+1]) ^(m=n) between the same first factors{x_(i), x_(i−1)} contained in the first register and the second factors{b_(j+1), b_(j+2)} corresponding to the calculation of the second rankn, contained in the second register and in the third register; load,into the first register, a pair of first factors {x_(i−2), x_(i−3)}corresponding to the calculation of the next operation of the firstpath; and load, into the second register, a pair of second factors{b_(j+4), b_(j+5)} corresponding to the calculation of the operationfollowing the next operation of the first path.
 9. The integratedcircuit according to claim 8, wherein the hardware calculator isconfigured, in the series of operations, to switch the functions of thesecond register and of the third register, for each successiveoperation.
 10. The integrated circuit according to claim 8, wherein thefirst register, the second register, the third register the first outputregister and the second output register have a size of 2M bits and eachcontains a pair of two data items encoded on M bits.
 11. The integratedcircuit according to claim 8, wherein the hardware calculator includes aselection circuit configured to distribute accesses to the secondregister and to the third register at the first calculation path and thesecond calculation path, such that the first path has access to thesecond factors {b_(j), b_(j+1)} corresponding to the operation containedin the second register, and that the second path has access to thesecond factors {b_(j+2), b_(j+3)} corresponding to the operationcontained in the second register and in the third register.
 12. Theintegrated circuit according to claim 8, wherein the hardware calculatoris configured to calculate in parallel a third output component Y_(n+1),of a third rank n+1 which is higher than and consecutive to the secondrank n, according to the same formula, the hardware calculator includinga third calculation path dedicated to the third output componentY_(n+1), and wherein, for each operation, the hardware calculator isfurther configured to calculate and accumulate, with the third path, ina third output register a pair of two products {b_(k)x_(m−k)}_([k;k+1])^(m=n+1) between the same first factors {x_(i), x_(i−1)} contained inthe first register and the second factors {b_(j+2), b_(j+3)}corresponding to the calculation of the third rank n+1, contained in thethird register.
 13. The integrated circuit according to claim 8, whereinthe integrated circuit is a digital signal processor.
 14. A method ofparallel calculation of a first output component Y_(n−1) of a first rankn−1 and of a second output component Y_(n) of a second rank n, which ishigher than and consecutive to the first rank, according to the formula:Y_(m)=Σ_(k=0) ^(N−1)b_(k)x_(m−k), in a series of operations, the methodcomprising, for each operation: calculating, by a hardware calculator,the first component Y_(n−1) by accumulating two products{b_(k)x_(m−k)}_([k;k+1]) ^(m=n−1) between first factors {x_(i), x_(i−1)}contained in a first register and second factors {b_(j), b_(j+1)}contained in a second register; calculating, by the hardware calculator,the second component Y_(n) by accumulating two products{b_(k)x_(m−k)}_([k;k+1]) ^(m=n) between the same first factors {x_(i),x_(i−1)} contained in the first register and the second factors{b_(j+1), b_(j+2)} corresponding to a calculation of the second rank n,contained in the second register and in a third register; loading, bythe hardware calculator, into the first register, a pair of firstfactors {x_(i−2), x_(i−3)} corresponding to the calculation of a nextoperation; and loading, by the hardware calculator, into the secondregister, a pair of second factors {b_(j+4), b_(j+5)} corresponding tothe calculation of the operation following the next operation of thefirst path.
 15. The method according to claim 14, further comprising, inthe series of operations, a switching of the functions of the secondregister and of the third register, for each successive operation. 16.The method according to claim 14, wherein the first register, the secondregister, the third register, the first output register and the secondoutput register have a size of 2M bits and each contain a pair of twodata items encoded on M bits.
 17. The method according to claim 14,wherein accesses to the second register and to the third register aredistributed such that a first calculation path dedicated to the firstoutput component Y_(n−1), has access to the second factors {b_(j),_(j+1)} corresponding to said operation contained in the secondregister, and that a second calculation path dedicated to the secondoutput component Y_(n), have access to the second factors {b_(j+2),b_(j+3)} corresponding to said operation contained in the secondregister and in the third register.
 18. The method according to claim14, further comprising a parallel calculation of a third outputcomponent Y_(n+1), of a third rank n+1, which is higher than andconsecutive to the second rank n, according to the same formula,wherein, for each operation, the method further comprises calculatingthe third component Y_(n+1) comprises a calculation and an accumulationof two products {b_(k)x_(m−k)}_([k;k+1]) ^(m=n+1) between the same firstfactors {x_(i), x_(i−1)} contained in the first register and the secondfactors {b_(j+2), b_(j+3)} corresponding to the calculation of the thirdrank n+1, contained in the third register.